# Load up the package
if (!require("pacman")) install.packages("pacman", repos="http://cran.r-project.org")
pacman::p_load(neotoma2, Bchron, splines, ggplot2, dplyr) # splines comes with BASE-RNeotoma age-depth
Background
For the last 50,000 years, radiocarbon (14C), with its half-life of 5,730 years, is by far the most common form of radiometric dating. (Beyond 10 half-lives, so much of a radioactive substance has decayed away that it becomes immeasurable.) Radiocarbon is the mainstay of Quaternary dating and archaeology.
In Quaternary paleoecology, radiocarbon dating is expensive – a single sample typically costs $300 to $500 – so usually a given lake-sediment record will have only a scattering (ca. 5 to 30) of radiocarbon dates and other age controls. Other kinds of age controls include volcanic ash layers (tephras), 210Pb (half-life: 22.6 yrs), optically stimulated luminescence (OSL) dates, historic events such as the rise in Ambrosia pollen abundances associated with EuroAmerican land clearance, etc., an age-depth model must be constructed to estimate the age of sediments not directly associated with an age control. In multi-site data syntheses, the number of age controls, their precision, and their likely accuracy are all fundamental indicators of data quality (e.g. Blois et al. 2011; Mottl et al. 2021).
To estimate ages for depths lacking radiocarbon date, an age-depth model is required. Age-depth models are fitted to the available age controls (age estimates with uncertainty for individual depths) and provide estimates of age as a function of depth, for all depths and ages within the temporal bounds of the model.
Here we will gain practice in working with age-depth models of various kind, and assessing their age estimates and uncertainty. We’ll begin with a bit of practice in calibrating radiocarbon years to calendar years and comparing the resulting age estimates from different calibration curves.
We will be using Bchron for calibration and Bayesian age-depth modelling. Notably rbacon is another commonly used package, see Trachsel and Telford (2017) for a discussion on age-depth models. We will also be fitting some interpolation and linear models using BASE-R. Remember you must have packages both installed and loaded, often loading a package is done via the library(packagename) function. The following code using the pacman package takes care of installation and loading in one go.
Radiocarbon callibration and uncertainty
A complication in radiocarbon dating is that the initial calculation of a radiocarbon age assumes, by convention, that the amount of radiocarbon in the atmosphere is constant over time. See Ramsey (2008) for a good overview of 14C dating. This assumption is untrue, so all radiocarbon age estimates must be post-hoc calibrated using a calibration curve that is based on compiling radiocarbon dates of materials that have precise independent age estimates (e.g. tree rings, corals). Yet another complication in radiocarbon dating is that different calibration curves need to be used for the Northern vs. Southern Hemisphere and for the atmosphere vs. oceans, due to different residence times of 14C in these different reservoirs. For example, atmospheric radiocarbon that diffuses into the surface oceans usually will reside for centuries before phytoplankton biologically fix it through photosynthesis, which will lead marine 14C to be depleted (and ‘too old’) relative to atmospheric 14C. Use the wrong calibration curve and your age estimate will be inaccurate!
The IntCal series (IntCal04, IntCal09, IntCal13, IntCal20) is the community standard for calibrating radiocarbon dates to age estimates in calendar years in the northern hemisphere (Reimer et al. 2020). The SHcal series is used for the southern hemisphere (Hogg et al. 2020). The conversion from radiocarbon to calendar years usually further increases the uncertainty of age estimates. Here is an example of a single date:
BchronCalibrate(ages = 11553,
ageSds = 50,
calCurves = 'intcal20') |>
plot(includeCal = TRUE)Calibrating radiocarbon dates in R
Here we’ll experiment with calibrating radiocarbon dates, using various calibration curves. Radiocarbon dated samples come back from the lab with a radiocarbon age and standard deviation, among other information. These two bits of information are used to calibrate the radiocarbon dates to estimated ages. R packages may have useful vignettes (package tutorials) and built-in datasets that provide handy test templates. The following code is modified from the Bchron vignette.
ages <- BchronCalibrate(ages = c(3445, 11553, 7456),
ageSds = c(50, 230, 110),
calCurves = c('intcal20','intcal20','intcal20'))summary(ages)95% Highest density regions for Date1
$`94.4%`
[1] 3572 3834
95% Highest density regions for Date2
$`0.4%`
[1] 13004 13025
$`77.9%`
[1] 13059 13874
$`16.4%`
[1] 13923 14012
95% Highest density regions for Date3
$`94.6%`
[1] 8022 8423
plot(ages)[[1]]
[[2]]
[[3]]
plot(ages, includeCal = TRUE, fillCol = 'red')`height` was translated to `width`.
The output summary indicates the range of the highest density regions (i.e., the most likely ‘real’ age range of the sample). The plot function in Bchron outputs a ggplot object of the high density regions of the most likely ages.
BchronCalibrate() is useful for checking out calibrating dates, but it is not necessary to do this step separately before running the age-depth model because the Bchronology() function includes calibration.
Your turn
The BchronCalibrate() example above uses three inputs: (i) uncalibrated ages (the estimated date returned from the lab); (ii) a standard deviation associated with each of the ages; (iii) a calibration curve associated with each of the dates. Take 5 minutes to:
- Experiment with calibrating a few different ages and standard deviations
- try a few single ages and plot the calibration curve as shown in Section 1.1
- also try a few dates at once as in Section 2
- check the function documentation for different calibration curves and try the southern hemisphere calibration.
Give a ✋ in zoom if you have questions and ✅ when you have managed to calibrate and inspect a few different dates.
We will go through a couple of examples together.
Types of Age-Depth Models
Different kinds of age-depth models exist, each with their own underlying assumptions and behavior. In the list below, #1-4 are classical or traditional forms of age-depth models, but Bayesian models are now the norm. The packages rbacon (usually referred to as ‘bacon’) and Bchron are the current standards for Bayesian age-depth modelling. Before going to bayesian models, we’ll begin with the classics.
Linear interpolation, a.k.a. ‘connect the dots,’ in which straight lines are drawn between each depth-adjacent pair of age controls.
Linear regression (\(y=b0~ + b1x\); \(y=\)time and \(x=\)depth; \(b0\) and \(b1\) are constants), in which a single straight line is fitted through the entire set of age controls. In ordinary linear regression (OLS), the chosen line will minimize the y-dimension distances of individual points to the line. Standard OLS assumes that all observations are normally distributed, which is a poor assumption for calibrated radiocarbon dates.
Polynomials, also fitted to the entire set of age controls (\(y= b0 + b1x + b2x^2 + b3x^3 + …bnx^n\)), are an extension of linear regression, with additional terms for \(x^2\), \(x^3\), etc. Some arbitrary maximum n is chosen, usually in the range of 3 to 5. These are called ‘third-order polynomials,’ ‘fifth-order polynomials,’ etc.
Splines, which are a special kind of polynomial function that are locally fitted to subsets of the age controls, and then smoothly interpolated between points. (Several different formulas can be used to generate splines; common forms include cubic, smooth, monotonic, and LOWESS).
Bayesian age models (e.g.
bacon,bchron,oxcal, etc.). Bayesian models differ in detail, but all combine a statistical representation of prior knowledge with the new data (i.e. the age controls at a site) to build an age-depth model with posterior estimates of age for any given depth. Bayesian models are now widely used because:- they allow the incorporation of prior knowledge (e.g., from coring many lakes, we now have decent estimates of typical sediment accumulation rates, Goring et al. (2012));
- they can handle highly irregular probability distribution functions such as those for radiocarbon dates after calibration; and as a result
- they generally do a better job of describing uncertainty than traditional age-depth models.
Classical age-depth models
Classical models are now out-dated methods (Blaauw et al. 2018), but it is useful to understand how they work as literature before the relatively recent development of Bayesian methods has relied on them. Let’s explore some classical methods of age-depth modelling using one of the datasets included with the Bchron package. The data are from a core in Northern Ireland; Sluggan Bog Smith and Goddard (1991), and can be called via:
data(Sluggan) # Call the data from Bchron
print(Sluggan) # Check out the dataLinear interpolation predicts ages by simply, drawing a line between successive dated samples. This method assumes that there is a constant age-depth relationship between samples. An assumption that is unlikely to be true, especially of cores with fewer dated samples than Sluggan Moss.
interp_ages <- approx(x = Sluggan$ages, y = Sluggan$position) # use the function approx() to interpolate between ages
plot(x = interp_ages$x, y = interp_ages$y, type = 'l') # Plot the interpolated data
points(x = Sluggan$ages, y = Sluggan$position, cex = 1.5, col = 'red') # overlay the original age pointsLinear regression provides a line of best fit through the dated samples. This method assumes a constant age-depth relationship across all samples, also unlikely to be true depending on processes affecting the core during its formation.
mod_ages <- lm(Sluggan$position ~ Sluggan$ages) # Create a linear regression model
plot(x = Sluggan$ages, y = Sluggan$position, cex = 1.5, col = 'red') # Plot the original ages
abline(mod_ages) # add the regression line from the regression modelPolynomial regression allows a curve to be fit through the data. The amount the curve ‘wiggles’ depends on the order of the polynomial fit to the data. Polynomial regression has the risk of being over-fit.
x <- Sluggan$ages # Renaming the variables because the predict function below is fussy about the input name
y <- Sluggan$position
poly_ages <- lm(y ~ poly(x, 3))
plot(x = Sluggan$ages, y = Sluggan$position, cex = 1.5, col = 'red')
age_range <- seq(from = range(Sluggan$ages)[1], to = range(Sluggan$ages)[2], length.out = 250)
lines(age_range, predict(poly_ages, data.frame(x = age_range)))Splines are a class of functions including, for example, smoothing splines or cubic splines. Cubic splines are pieve-wise polynomials locally between ‘knots’. That is the data are split into bins that are fit independently using. By default, the bs() function uses a third degree polynomial. Without providing knots the fit will look the same as a third degree polynomial regression.
cubic_ages <- lm(y ~ bs(x, knots = c(1000, 6000, 12000)))
plot(x = Sluggan$ages, y = Sluggan$position, cex = 1.5, col = 'red')
lines(age_range, predict(cubic_ages, data.frame(x = age_range)))Because classical age-depth modelling is rarely used now we are not going to delve further into the statistical details of the best way of fitting each model (e.g., the number of knots to use for fitting a cubic spline).
One of the issues with classical age-depth modelling is that uncertainty decreases with fewer datapoints.
Bayesian age-depth models
Now let’s see what the latest methods show for the same dataset. The Sluggan dataset has a lot of radiocarbon dates! We are going to experiment by reducing the number and placement of radiocarbon samples through the core and seeing the affect on the model prediction and uncertainty.
Note that all the values provided to the arguments are contained in the Sluggan dataframe. When creating chronologies from your own (or accessed data), you may need to rename them to match your data.
set.seed(1984) # The models are stochastic, good idea to set the seed for reproducibility
SlugganOut = with(Sluggan,
Bchronology(ages=ages,
ageSds=ageSds,
calCurves=calCurves,
positions=position,
positionThicknesses=thickness,
ids=id,
predictPositions=c(
seq(min(Sluggan$position), max(Sluggan$position), by=10),
518)
)
)The summary shows for each position (depth) the median and quartiles of the predicted ages for that position.
summary(SlugganOut)Quantiles of predicted ages by depth:
Depth 2.5% 25% 50% 75% 97.5%
44.5 630.800 824.00 883.5 942.25 1039.025
54.5 1057.975 1169.00 1232.5 1300.00 1454.025
64.5 1198.825 1337.00 1404.0 1473.00 1587.050
74.5 1458.975 1574.75 1642.0 1722.50 1918.025
84.5 1559.875 1718.75 1816.0 1898.00 2104.000
94.5 1708.950 1901.75 1978.0 2050.00 2220.000
104.5 2064.975 2169.00 2280.5 2381.00 2672.025
114.5 2227.925 2512.00 2648.5 2789.00 3028.050
124.5 2779.875 2952.00 3028.0 3117.00 3303.025
134.5 3052.000 3253.00 3394.0 3565.00 3916.250
144.5 3212.950 3496.00 3689.5 3870.50 4155.025
154.5 3450.750 3803.00 3976.0 4114.25 4314.000
164.5 3996.950 4200.00 4287.0 4372.00 4499.000
174.5 4265.975 4407.00 4485.5 4556.00 4690.075
184.5 4440.950 4571.00 4643.0 4718.00 4848.050
194.5 4553.975 4702.00 4777.0 4855.00 4994.025
204.5 4634.950 4794.00 4865.5 4942.00 5080.050
214.5 4716.950 4878.00 4953.0 5025.25 5156.100
224.5 4834.975 4978.00 5047.0 5109.00 5240.000
234.5 5210.000 5350.00 5422.0 5481.00 5588.225
244.5 5560.975 5678.75 5729.5 5787.00 5910.075
254.5 5665.975 5769.00 5829.5 5891.00 6003.025
264.5 5752.925 5878.00 5928.0 5980.00 6077.125
274.5 6089.925 6219.00 6279.5 6365.00 6756.125
284.5 6368.975 6656.50 6853.5 7069.50 7440.050
294.5 6701.625 7233.75 7392.0 7497.25 7661.075
304.5 7573.925 7714.00 7808.0 7942.25 8227.000
314.5 7703.975 7936.75 8078.5 8220.25 8459.125
324.5 7919.800 8205.00 8326.5 8447.00 8647.000
334.5 8420.975 8534.00 8615.0 8705.00 8887.025
344.5 8504.950 8647.50 8733.0 8827.00 8992.025
354.5 8594.000 8739.00 8835.0 8929.25 9083.175
364.5 8709.000 8855.00 8955.0 9035.00 9163.050
374.5 9125.000 9266.75 9342.0 9417.00 9575.025
384.5 9220.000 9352.00 9424.0 9516.00 9702.050
394.5 9290.000 9424.50 9508.0 9614.00 9809.125
404.5 9380.900 9509.00 9614.0 9718.25 9939.200
414.5 9555.925 9756.00 9870.0 9990.00 10195.025
424.5 9700.825 9955.00 10120.0 10217.00 10329.025
434.5 9904.875 10148.00 10292.0 10351.25 10496.225
444.5 9982.950 10247.75 10350.0 10419.00 10578.025
454.5 10547.775 10745.75 10833.0 10942.25 11120.125
464.5 10950.875 11221.75 11350.5 11531.00 12001.175
474.5 11311.650 11757.00 12011.0 12212.00 12542.100
484.5 12348.375 12672.00 12762.5 12831.25 12963.125
494.5 12801.000 12916.75 12975.0 13034.25 13160.350
504.5 13069.925 13226.00 13305.0 13394.25 13575.050
514.5 13580.875 13767.50 13873.0 13961.00 14153.200
518.0 13796.950 13996.75 14110.0 14230.00 14563.125
The summary shows possible outliers per date.
summary(SlugganOut, type = "outliers")Posterior outlier probability by date:
Date OutlierProb
UB-201A 0.014
UB-211A 0.013
UB-437 0.004
UB-748 0.005
UB-438 0.003
UB-439 0.006
UB-440 0.009
UB-219D 0.012
UB-219B 0.008
UB-219A 0.010
UB-441 0.010
UB-220D 0.031
UB-220A 0.019
UB-220E 0.018
UB-221A 0.010
UB-749 0.012
UB-223B 0.133
UB-223A 0.020
UB-223D 0.150
UB-442 0.009
UB-443 0.042
UB-225B 0.203
UB-225D 0.011
UB-225F 0.025
UB-444 0.008
UB-227D 0.008
UB-227F 0.015
UB-446 0.009
UB-447 0.004
UB-229D 0.012
UB-229F 0.047
plot(SlugganOut)Commonly, only the median age is reported; however, it is good practice to report and consider the range of possible ages, especially when exploring synchroneity of events across space. Notice how Bchron focuses on delivering ranges and densities.
Your turn
The Sluggan data provide a great in-built working dataset for us to play with. Take 5 minutes to:
- load up the
Sluggandataset (data(Sluggan)) - sample the dataset to reduce the number of rows (e.g., create a new dataset
Sluggan2 <- Sluggan[c(1, 10, 31), ]) - Run and plot the
Bchronage-depth model
Give a ✋ in zoom if you have questions and ✅ when you have managed to calibrate and inspect a few different dates.
Who has a figure they are willing to share?
Let’s discuss how the output changes depending on the placement and number of chronological controls.
- Now edit the
predictPositionsargument. This argument expects a depth sequence. Currently it is set as a sequence for every 10 cm (plus the core bottom at 518 cm). Try different depth resolutions (careful of the run time, if you go for a 1 cm sequence then it will take a while to run!).
Accumulation rate
Now that we have an age-depth model, we can also obtain the accumulation rates:
acc_rate <- summary(SlugganOut,
type = "acc_rate",
probs = c(0.25, 0.5, 0.75)
)And visualise them:
ggplot2::ggplot(acc_rate, aes(
x = age_grid, y = `50%`,
ymax = `75%`, ymin = `25%`)) +
geom_line() +
geom_ribbon(alpha = 0.2) +
theme_bw() +
scale_x_reverse() +
labs(
y = "cm per year",
x = "Age (cal years BP)"
)Neotoma example
So far we have used the built-in dataset from the northern hemisphere for the sake of demonstration. Let’s now follow an example by pulling data from the Neotoma database and model age-depth in the southern hemisphere. Since this tutorial is focused on age-depth modelling we won’t explore obtaining data from Neotoma in any detail, we will just have a look at rebuilding a chronology from Neotoma sites. The Neotoma github repository has links to recent workshops for learning how to use the neotoma2 R package.
The following code will download site metadata from the Neotoma database and plot the sites on a map. I have also saved some example datasets in the data directory just in case the Neotoma API is under maintenance or there are download speed issues.
if (!file.exists(file.path("./data/some_sites.rds"))) {
site_ids <- c(10063, 1639, 10136, 28408, 9865, 28489, 31732)
some_sites <- neotoma2::get_sites(site_ids)
saveRDS(some_sites, "./data/some_sites.rds")
} else {
some_sites <- readRDS("./data/some_sites.rds")
neotoma2::plotLeaflet(some_sites)
}The next chunk of code uses the metadata to filter for sites with pollen records (and some time-span, i.e., not surface samples) and downloads the data.
if (!file.exists(file.path("./data/some_datasets.rds"))) {
some_datasets <-
neotoma2::get_datasets(some_sites, all_data = TRUE) |>
neotoma2::filter(datasettype == "pollen" & !is.na(age_range_young)) |>
neotoma2::get_downloads()
saveRDS(some_datasets, "./data/some_datasets.rds")
} else {
some_datasets <- readRDS("./data/some_datasets.rds")
}Now we have some sites that we can explore, let’s rebuild a chronology. Note, that each core may have more than one chronology associated with it. We can check how many chronologies are associated with a site by the chronologyid.
# Meta data for each chronology
controls_meta <- neotoma2::chronologies(some_datasets[[1]])
# Dataframe of the chronologial conrols
controls <- neotoma2::chroncontrols(some_datasets[[1]]) |>
dplyr::arrange(depth)
unique(controls$chronologyid)[1] 777 33123
See that this core has two chronologies? I’m picking the second one in this case, but you probably will want to see associated publications if listed.
controls <- neotoma2::chroncontrols(some_datasets[[1]]) |>
dplyr::filter(chronologyid == 33123) |> # If there is more than on chronology
dplyr::arrange(depth)Note that the controls dataframe contains all the information necessary to build a new chronology. The data are not in the identical format as the in-build Sluggan dataset, so some very minor adjustment is necessary. The Sluggan data has one field for standard deviation, the Neotoma data has an upper (agelimitolder) and lower (agelimityounger) limit. These two variables are the age (chroncontrolage) plus and minus the standard deviation. So the standard deviation is the difference between the age and the limits (abs(controls$agelimityounger - controls$chroncontrolage)). There is also no calibration curves field. Both the standard deviation and the calcurves can be added to the dataframe.
controls <- controls |>
dplyr::mutate(ageSds = abs(controls$agelimityounger - controls$chroncontrolage),
calCurves = c("normal", rep("shcal20", 9)))It is good practice to use set.seed(...) before running an age-depth model. The models are stochastic and will give slightly different results each time. For reproducibility it is good to fix the seed, and also for your own sanity when you re-run your code.
set.seed(1984)
newChron <- Bchron::Bchronology(
ages = controls$chroncontrolage,
ageSds = controls$ageSds,
calCurves = controls$calCurves,
positions = controls$depth,
positionThicknesses = controls$thickness,
ids = controls$chroncontrolid,
predictPositions=
seq(min(controls$depth), max(controls$depth), by=10))plot(newChron)Your turn
In the remaining time, give a shot at building or rebuilding a chronology. If you have your own data, or would like to download a site of your choice from Neotoma, feel free to do that. Neotoma explorer can help you find a site id.
- bring your own data, pick one of the existing downloaded sites, or find a site on neotoma explorer.
- Build a new chronology!
The end
Age-depth modelling is complicated, there are many pitfalls, assumptions, and uncertainties that are often ignored. Recent developments have begun to focus on quantifying uncertainties to understand the reliability of inferences made from the data. Key papers for understanding age depth modelling include: